Mel Frequency Cepstral Coefficients: An Evaluation of Robustness of MP3 Encoded Music
نویسندگان
چکیده
In large MP3 databases, files are typically generated with different parameter settings, i.e., bit rate and sampling rates. This is of concern for MIR applications, as encoding difference can potentially confound meta-data estimation and similarity evaluation. In this paper we will discuss the influence of MP3 coding for the Mel frequency cepstral coeficients (MFCCs). The main result is that the widely used subset of the MFCCs is robust at bit rates equal or higher than 128 kbits/s, for the implementations we have investigated. However, for lower bit rates, e.g., 64 kbits/s, the implementation of the Mel filter bank becomes an issue.
منابع مشابه
Shape-based Spectral Contrast Descriptor
Mel-frequency cepstral coefficients are used as an abstract representation of the spectral envelope of a given signal. Although they have been shown to be a powerful descriptor for speech and music signals, more accurate and easily interpretable options can be devised. In this study, we present and evaluate the shape-based spectral contrast descriptor, which is build up from the previously prop...
متن کاملNoise-Robust Speech Features Based on Cepstral Time Coefficients
In this paper, we investigate the noise-robustness of features based on the cepstral time coefficients (CTC). By cepstral time coefficients, we mean the coefficients obtained from applying the discrete cosine transform to the commonly used mel-frequency cepstral coefficients (MFCC). Furthermore, we apply temporal filters used for computing delta and acceleration dynamic features to the CTC, res...
متن کاملSinging/humming System through Query Proportion
Query by Singing/Humming (QBSH) is a Music Information Retrieval (MIR) system with small audio excerpt as query. The rising availability of digital music stipulates effective music retrieval methods. Further, MIR systems support content based searching for music and requires no musical acquaintance. Current work on QBSH focuses mainly on melody features such as pitch, rhythm, note etc., size of...
متن کاملAn Extensive Analysis of Query by Singing/Humming System Through Query Proportion
Query by Singing/Humming (QBSH) is a Music Information Retrieval (MIR) system with small audio excerpt as query. The rising availability of digital music stipulates effective music retrieval methods. Further, MIR systems support content based searching for music and requires no musical acquaintance. Current work on QBSH focuses mainly on melody features such as pitch, rhythm, note etc., size of...
متن کاملImproving the noise-robustness of mel-frequency cepstral coefficients for speech processing
In this paper we study the noise-robustness of mel-frequency cepstral coefficients (MFCCs) and explore ways to improve their performance in noisy conditions. Improvements based on a more accurate model of the early auditory system are suggested to make the MFCC features more robust to noise while preserving their class discrimination ability. Speech versus non-speech classification and speech r...
متن کامل